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Abstract 

Valiant showed that Boolean matrix mul- 
tiplication (BMM) can be used for CFG 
parsing. We prove a dual result: CFG 
parsers running in time 0(|G||w| 3_e ) on a 
grammar G and a string w can be used to 
multiply m x m Boolean matrices in time 
0(m 3-e / 3 ). In the process we also provide 
a formal definition of parsing motivated by 
an informal notion due to Lang. Our re- 
sult establishes one of the first limitations 
on general CFG parsing: a fast, practical 
CFG parser would yield a fast, practical 
BMM algorithm, which is not believed to 
exist. 

1 Introduction 

The context-free grammar (CFG) formalism was 
developed during the birth of the field of com- 
putational linguistics. The standard methods for 



CFG parsing are the CKY algorithm (Kasami 
1965 ; [Younger, 1967 ) and Earley's algorithm (Ear- 



ley, 1970| ), both of which have a worst-case running 



time of 0(gN 3 ) for a CFG (in Chomsky normal 
form) of size g and a string of length N . Gra- 
ham et al. ( |1980| ) give a variant of Earley's algo- 
rithm which runs in time 0(gN 3 / log N). Valiant's 
parsing metho d is the asymptotically fastest known 
( Valiant, 1975 ). It uses Boolean matrix multiplica- 
tion (BMM) to speed up the dynamic programming 
in the CKY algorithm: its worst-case running time 
is 0(gM(N)), where M(m) is the time it takes to 
multiply two m x m Boolean matrices together. 

The standard method for multiplying matrices 
takes time 0(m 3 ). There exist matrix multiplica- 
tion algorithms with time complexity 0(m 3 ); for 
instance, Strassen's has a worst-case running time of 
0(m 2,81 ) (Strassen, 196E), and the fastest currently 
known has a worst-case running time of 0(m 2 ' 376 ) 



( |Coppcrsmith and Winograd, 199C ). Unfortunately, 
the constants involved are so large that these fast al- 
gorithms (with the possible exception of Strassen's) 
cannot be used in practice. As matrix multiplication 
is a very well-studied problem (see Strassen's histor- 
ical account (Strassen, 199C, section 10)), it is highly 
unlikely that simple, practical fast matrix multipli- 
cation algorithms exist. Since the best BMM algo- 
rithms all rely on general matrix multiplication^, it is 
widely believed that there are no practical 0(m 3_l5 ) 
BMM algorithms. 

One might therefore hope to find a way to speed 
up CFG parsing without relying on matrix multi- 
plication. However, we show in this paper that fast 
CFG parsing requires fast Boolean matrix multipli- 
cation in a precise sense: any parser running in time 
0(gN 3 ~ £ ) that represents parse data in a retrieval- 
efficient way can be converted with little computa- 
tional overhead into a 0(m 3_e / 3 ) BMM algorithm. 
Since it is very improbable that practical fast ma- 
trix multiplication algorithms exist, we thus estab- 
lish one of the first nontrivial limitations on practical 
CFG parsing. 

Our technique, adapted from that used by Satta 



(1994) for tree-adjoining grammar (TAG) parsing, 
is to show that BMM can be efficiently reduced to 
CFG parsing. Satta's result does not apply to CFG 
parsing, since it explicitly relies on the properties of 
TAGs that allow them to generate non-context-free 
languages. 



2 Definitions 

A Boolean matrix is a matrix with entries from the 
set {0, 1}. A Boolean matrix multiplication algo- 
rithm takes as input two m x m Boolean matrices 
A and B and returns their Boolean product A x B 1 



The "four Russians" algorithm (Arlazarov et al 



197C| ), the fastest BMM algorithm that does not simply 



use ordinary matrix multiplication, has worst-case run- 
ning time 0(m 3 / log m) 



which is the m x m Boolean matrix C whose entries 



are defined by 



= V (ajfc A b k j) ■ 



k=l 



That is, Cij = 1 if and only if there exists a number 
k, 1 < k < m, such that Ojfc = fefcj = 1. 

We use the usual definition of a context-free gram- 
mar (CFG) as a 4-tuple G = (E, V, i?, 5), where E is 
the set of terminals, V is the set of nonterminals, R 
is the set of productions, and S € V is the start sym- 
bol. Given a string w = w\W2 ■ ■ • wn over E*, where 
each Wi is an element of E, we use the notation 



to denote the substring WiWi+i 



Uj-lWj. 



We will be concerned with the notion of c- 
derivations, which are substring derivations that are 
consistent with a derivation of an entire string. Intu- 
itively, A =^>* wl is a c-derivation if it is consistent 
with at least one parse of w. 

Definition 1 Let G = (E, V, R, S) be a CFG, and 
let w = W1W2 ■ ■ ■ wn, Wi £ E. A nonterminal A G V 
c-derives (consistently derives) wj if and only if the 
following conditions hold: 

• A =>* wf, and 

• S ^* w^Awf^. 

(These conditions together imply that S w.) 

We would like our results to apply to all "prac- 
tical" parsers, but what does it mean for a parser 
to be practical? First, we would like to be able 
to retrieve constituent information for all possible 
parses of a string (after all, the recovery of struc- 
tural information is what distinguishes parsing algo- 
rithms from recognition algorithms); such informa- 
tion is very useful for applications like natural lan- 
guage understanding, where multiple interpretations 
for a sentence may result from different constituent 
structures. Therefore, practical parsers should keep 
track of c-derivations. Secondly, a parser should 
create an output structure from which information 
about c onstituents c an be retrieved in an efficient 
way — Batta (1994 ) points out an observation of 



Lang to the effect that one can consider the input 
string itself to be a retrieval-inefficient representa- 
tion of parse information. In short, we require prac- 
tical parsers to output a representation of the parse 
forest for a string that allows efficient retrieval of 
parse information. Lang in fact argues that pars- 
ing means exactly the production of a shared forest 
structure "from which any specific parse can be ex- 
tracted in time linear with the size of the extracted 



parse tree" QLang, 1994 PS- 487), and |Satta (1994| ) 
makes this assumption as well. 

These notions lead us to equate practical parsers 
with the class of c-parsers, which keep track of c- 
derivations and may also calculate general substring 
derivations as well. 

Definition 2 A c-parser is an algorithm that takes 
a CFG grammar G — (E, V, R, S) and string w 6 E* 
as input and produces output Tq_ w ; Tq_ w acts as an 
oracle about parse information, as follows: 



If A c-derives w\, then J-q_ w (A,i, j) 



yes 



• If A 7^* wj (which implies that A does not c- 
derive w\), then Tc,w(A,i, j) ~ "no". 

• J~a,w answers queries in constant time. 

Note that the answer J-g,w gives can be arbitrary if 



A 



but A does not c-derive ', 



The constant- 



time constraint encodes the notion that information 
extraction is efficient; observe that this is a stronger 
condition than that called for by Lang. 

We define c-parsers in this way to make the class 
of c-parsers as broad as possible. If we had changed 
the first condition to "If A derives w\ . . ." , then Ear- 
ley parsers would be excluded, since they do not 
keep track of all substring derivations. If we had 
written the second condition as "If A does not c- 
derive w{ , then . . . " , then CKY parsers would not 
be c-parsers, since they keep track of all substring 
derivations, not just c-derivations. So as it stands, 
the class of c-parsers includes tabular parsers (e.g. 
CKY), where Tq,w is the table of substring deriva- 
tions, and Earley-type parsers, where Tg,w is the 
chart. Indeed, it includes all of the parsing algo- 
rithms mentioned in the introduction, and can be 
thought of as a formalization of Lang's informal def- 
inition of parsing. 

3 The reduction 

We will reduce BMM to c-parsing, thus proving that 
any c-parsing algorithm can be used as a Boolean 
matrix multiplicatio n algorithm. Our method, 
adapted from that of Satta (1994 ) (who considered 
the problem of parsing with tree-adjoining gram- 
mars), is to encode information about Boolean ma- 
trices into a CFG. Thus, given two Boolean matrices, 
we need to produce a string and a grammar such 
that parsing the string with respect to the gram- 
mar yields output from which information about the 
product of the two matrices can be easily retrieved. 

We can sketch the behavior of the grammar as 
follows. Suppose entries in A and bkj in B are 
both 1. Assume we have some way to break up array 



indices into two parts so that i can be reconstructed 
from i\ and i 2 , j can be reconstructed from ji and 
j 2 , and k can be reconstructed from k\ and k 2 . (We 
will describe a way to do this later.) Then, we will 
have the following derivation (for a quantity S to be 
defined later) : 



a 



Ai 1 .k 1 B] Cl J 1 

Wj 2 ■ ■ • Wfc 2+ g W k2 +6+l ■ ■ ■ Wj 2 +2S ■ 

derived by A lukl derived by Bk 1 , n 

The key thing to observe is that Ci 1 j 1 generates two 
nonterminals whose "inner" indices match, and that 
these two nonterminals generate substrings that lie 
exactly next to each other. The "inner" indices con- 
stitute a check on k%, and the substring adjacency 
constitutes a check on k 2 . 

Let A and B be two Boolean matrices, each of size 
m x m, and let C be their Boolean matrix product, 
C = A x B. In the rest of this section, we consider 
A, B, C, and m to be fixed. Set n — [m 1 / 3 ] , and set 
5 = n + 2. We will be constructing a string of length 
35; we choose 6 slightly larger than n in order to 
avoid having epsilon-productions in our grammar. 

Recall that Cy is non-zero if and only if we can find 
a non-zero a^. and a non-zero b-^. such that k = k. 
In essence, we need simply check for the equality 
of indices k and k. We will break matrix indices 
into two parts: our grammar will check whether the 
first parts of k and k are equal, and our string will 
check whether the second parts are also equal, as we 
sketched above. Encoding the indices ensures that 
the grammar is of as small a size as possible, which 
will be important for our time bound results. 

Our index encoding function is as follows. Let i 
be a matrix index, 1 < i < m. Then wc define the 
function f(i) = (fi(i), f 2 (i)) by 

A 00 = [i/n\ (0</iW<n 2 ), and 

f 2 (i) = (i mod n) + 2 (2 < f 2 (i) < n+ 1). 

Since f\ and f 2 are essentially the quotient and re- 
mainder of integer division of i by n, we can retrieve i 
from (/i(i), f 2 (i))- We will use the notational short- 
hand of using subscripts instead of the functions /i 
and f 2 , that is, we write i\ and i 2 for fi(i) and f 2 (i)- 
It is now our job to create a CFG G — (£, V, R, S) 
and a string w that encode information about A 
and B and express constraints about their prod- 
uct C. Our plan is to include a set of nontermi- 

= 1 if 



In section 3.1 



nals {C p , q : 1 < p, q < n 2 } in V so that 
and only if Ci 1 j 1 c-derives w^ 2 " 1 
we describe a version of G and prove it has this c- 
derivation property. Then, in section 3.2 we explain 



that G can easily be converted to Chomsky normal 
form in such a way as to preserve c- derivations. 

We choose the set of terminals to be S = {w£ : 
1 < £ < 3n + 6}, and choose the string to be parsed 
to be w — wiw 2 ■ ■ ■ z«3„+6- We consider w to be 
made up of three parts, x, y, and z, each of size 5: 
W = WlW 2 ■ ■ ■ w n+2 w n+3 ■ ■ ■ w 2n+4: w 2n+5 ■ ■ ■ w 3n+e . 

V v ' V v ' V v ' 

x y z 

Observe that for any i, 1 < i < m, Wi 2 lies within x, 
Wi 2 +s lies within y, and Wi 2 + 2 s lies within z, since 

i 2 G [2,n + l], 
i 2 + S G [n + 4, 2n + 3], and 
i 2 + 2S g [2n + 6,3n + 5]. 

3.1 The grammar 

Now we begin building the grammar G = 
{Jj,V,R, S). We start with the nonterminals V ~ 
{S} and the production set R — 0. We add nonter- 
minal W to V for generating arbitrary non-empty 
substrings of w; thus we need the productions 



w e W\w e , 1 < £ < 3n + 6. (W -ivies) 



Next we encode the entries of the input matrices 
A and B in our grammar. We include sets of 
non-terminals {A p>q : l<p,q<n 2 } and {B vq : 
1 < P, q < n 2 }. Then, for every non-zero entry a^- 
in A, we add the production 



A, 



w i2 Ww j2+ s. 



(A-rules) 



For every non-zero entry bij in B, we add the pro- 
duction 



B, 



Wi. 



-i+sWw 32+2 s- (B-rules) 



We need to represent entries of C, so we create 
nonterminals \C v , q ■ 1 < p, q < n 2 } and productions 



C, 



p.q 



A R 



1 < p,q,r < n 



(C-rules) 

Finally, we complete the construction with pro- 
ductions for the start symbol S: 

S — > WC Ptq W, l<p,q<n 2 . (S*-rules) 

We now prove the following result about the gram- 
mar and string we have just described. 

Theorem 1 For 1 < i,j < m, the entry Cij in C is 
non-zero if and only ifCi 1 j 1 c-derives w J ^ +2S . 

Proof. Fix i and j. 

Let us prove the "only if" direction first. Thus, 
suppose = 1. Then there exists a k such that 
o-ik — bkj — 1 ■ Figure |l| sketches how Ci 1 I j 1 c-derives 

K ■ 




Figure 1: Schematic of the derivation process when ay, = bkj = 1. The substrings derived by A^^i and 
Bk 1 j 1 lie right next to each other. 



Claim 1 C iujl ^* w£ +2S . 

The production Ci 1 .j 1 — > Ai 1 ,k 1 Bk 1 j 1 is one of the 
C-rules in our grammar. Since = 1, Ai 1 ^ 1 — > 
Wi 2 Wwk 2 +s is one of our ^4-rules, and since bkj = 1, 
Bfe ll3l — > wj; 2+ i + iiyi()j !+ 2j is one of our B-rules. 
Finally, since i^ + 1 < (&2 + S) — 1 and (&2 + 1 + 



5) + 1 < (j 2 + 25) - 1, we have W 



W 



Jz+25- 
J k 2 +2+S 



k 2 +5- 
U i2 + 1 



1 and 



1 , since both substrings are of length 



at least one. Therefore, 



Ai 1 .k 1 Bk 1 j 1 
w i2 Wwk 2 +s 



Wk 2 + l+sWWj 2+2 S 



derived by A H>kl derived by B fell31 



and Claim [j] follows. 
Claim 2 S => 



1/7- - 



This claim is essentially trivial, since by the defini- 
tion of the S-rules, we know that S WCi 1 j 1 W. 
We need only show that neither u^ 2_1 nor w 3 ™ + 2 6 5+1 
is the empty string (and hence can be derived by 
W); since 1 < i 2 - 1 and j 2 + 25 + 1 < 3n + 6, the 
claim holds. 

Claims and H together prove that j t c-derives 



3n+6 



w - 



ja+25 

12 



, as required 



This proof would have been simpler if we had al- 
lowed W to derive the empty string. However, we avoid 
epsilon-productions in order to facilitate the conversion 
to Chomsky normal form, discussed later. 



Next we prove the "if" direction. Suppose 



Ci 1 .j 1 c-derives 

a 



,,32+25 



which by definition means 
, , — • dA 2+2(5 . Then there must be a derivation 
resulting from the application of a C-rule as follows: 



for some k' . 



It must be the case that for some I, 



and B, 



fc' ,31 



-2d 



t+1 



we must have the productions Ai lt 
and Bk'j 1 — > W£+iWwj 2 +25 with I = k" 



But then 

> w l2 Wwi 
6 for 

some k" . But we can only have such productions if 
there exists a number k such that k\ = k' , k 2 = k", 
ciik = 1, and = 1; and this implies that cy = 1. 
■ 

Examination of the proof reveals that we have also 
shown the following two corollaries. 

Corollary 1 For 1 < i,j < m, Cij — 1 if and only 
ifC iujl w£ +2S . 

Corollary 2 S =>* w if and only if C is not the 
all-zeroes matrix. 

Let us now calculate the size of G. V consists of 
0((n 2 ) 2 ) = 0(to 4 / 3 ) nonterminals. R contains 0(n) 
VK-rules and 0{{n 2 ) 2 ) = 0(m 4 / 3 ) S-rules. There 
are at most m 2 A-rules, since we have an ^4-rule for 
each non-zero entry in A; similarly, there are at most 
m 2 B-rules. And lastly, there are (n 2 ) 3 = 0(m 2 ) 
C-rules. Therefore, our grammar is of size 0(m 2 ); 
since G encodes matrices A and B, it is of optimal 
size. 



3.2 Chomsky normal form 

We would like our results to be true for the largest 
class of parsers possible. Since some parsers require 
the input grammar to be in Chomsky normal form 
(CNF), we therefore wish to construct a CNF version 
G' of G. However, in order to preserve time bounds, 
we desire that 0{\G'\) = 0(\G\), and we also require 
that Theorem [j] holds for G' as well as G. 

The standard algorithm for converting CFGs to 
CNF can yield a quadratic blow-up in the size of 
the grammar and thus is clearly unsatisfactory for 
our purposes. However, since G contains no epsilon- 
productions or unit productions, it is easy to see 
that we can convert G simply by introducing a small 
(0(n)) number of nonterminals without changing 
any c-derivations for the C Ptq . Thus, from now on 
we will simply assume that G is in CNF. 

3.3 Time bounds 

We are now in a position to prove our relation be- 
tween time bounds for Boolean matrix multiplica- 
tion and time bounds for CFG parsing. 

Theorem 2 Any c-parser P with running time 
0(T(g)t(N)) on grammars of size g and strings of 
length N can be converted into a BMM algorithm 
Mp that runs in time 0(max(m 2 , T(m 2 )t(m 1 / 3 ))). 
In particular, if P takes time 0(gN 3 ~ e ), then Mp 
runs in time 0(m 3 ~ e / 3 ). 

Proof. Mp acts as follows. Given two Boolean 
m x m matrices A and B, it constructs G and w 
as described above. It feeds G and w to P, which 
outputs Tg,w To compute the product matrix C, 
Mp queries for each i and j , 1 < i, j < m, whether 
Ci 1 j 1 derives w{ 2 2 +2S (we do not need to ask whether 
Ci 1 j 1 c-derives w{ 2+2S because of corollary |]), set- 
ting Cij appropriately. By definition of c-parsers, 
each such query takes constant time. Let us com- 
pute the running time of Mp. It takes 0(m 2 ) 
time to read the input matrices. Since G is of 
size 0(m 2 ) and \w\ = C^rn 1 / 3 ), it takes 0(m 2 ) 
time to build the input to P, which then com- 
putes J-g,w in time 0{T{rn 2 )t{rn 1 ^)). Retrieving 
C takes 0{m 2 ). So the total time spent by Mp is 
0(max(m 2 , T(m 2 )t(m 1 / 3 ))), as was claimed. 

In the case where T(g) = g and t(N) = N 3 ~ e , 

Mp has a running time of 0(rn 2 (m 1//3 ) 3 ~ e ) — 
0(m 2+i-e/ 3) = ( m 3-e/3) u 

The case in which P takes time linear in the gram- 
mar size is of the most interest, since in natural lan- 
guage processing applications, the grammar tends 
to be far larger than the strings to be parsed. Ob- 
serve that theorem || translates the running time of 



the standard CFG parsers, 0(gN 3 ), into the run- 
ning time of the standard BMM algorithm, 0(m 3 ). 
Also, a c-parser with running time 0(gN 2A3 ) would 
yield a matrix multiplication algorithm rivalling that 
of Strassen's, and a c-parser with running time bet- 
ter than 0(gN 112 ) could be converted into a BMM 
method faster than Coppersmith and Winograd. As 
per the discussion above, even if such parsers exist, 
they would in all likelihood not be very practical. 
Finally, we note that if a lower bound on BMM of 
the form fl(m 3 ~ a ) were found, then we would have 
an immediate lower bound of il(N 3 ~ 3a ) on c-parsers 
running in time linear in g. 

4 Related results and conclusion 

We have shown that fast practical CFG parsing algo- 
rithms yield fast practical BMM algorithms. Given 
that fast practical BMM algorithms are unlikely to 
exist, we have established a limitation on practical 
CFG parsing. 

Valiant (personal communication) notes that 
there is a reduction of m x m Boolean matrix mul- 
tiplication checking to context-free recognition of 
strings of length m 2 ; this reduction is alluded to in 



a footnote of a paper by Harrison and Havel (1974). 
However, this reduction converts a parser running in 
time 0(|w| 15 ) to a BMM checking algorithm run- 
ning in time 0(m 3 ) (the running time of the stan- 
dard multiplication method) , whereas our result says 
that sub-cubic practical parsers are quite unlikely; 
thus, our result is quite a bit stronger. 



Seiferas (19861 ) gives a simple proof of an Q( ^ N ) 



lower bound (originally due to Gallairc (196S )) for 
the problem of on-line linear CFL recognition by 
multitape Turing machines. However, his results 
concern on-line recognition, which is a harder prob- 
lem than parsing, and so do not apply to the general 
off-line parsing case. 

Finally, we recall Valiant's reductio n of CFG pars - 
ing to boolean matrix multiplication (Valiant, 1975); 
it is rather pleasing to have the reduction cycle com- 
pleted. 
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